Ref 

# 


Hits 


Search Query 


DBs 


Default 
Operator 


Plurals 


Time Stamp 


L2 


2 


("6285999").PN. 


US-PGPUB; 

USPAT; 

USOCR; 

EPO; JPO; 

DERWENT; 

IBM_TDB 


OR 


OFF 


2006/01/07 12:02 


L3 


22855 


(707/1-5,7, 10,100,104.1).CCLS. 


US-PGPUB; 

USPAT; 

USOCR; 

EPO; JPO; 

DERWENT; 

IBM_TDB 


OR 


OFF 


2006/01/07 12:02 


L4 


1206 


(715/501.1).CCLS. 


US-PGPUB; 

USPAT; 

USOCR; 

EPO; JPO; 

DERWENT; 

IBM_TDB 


OR 


OFF 


2006/01/07 12:02 


L5 


262 


3 and 4 


US-PGPUB; 

USPAT; 

USOCR; 

EPO; JPO; 

DERWENT; 

IBM_TDB 


OR 


ON 


2006/01/07 12:03 


L6 


23799 


3 or 4 


US-PGPUB; 

USPAT; 

USOCR; 

EPO; JPO; 

DERWENT; 

IBM_TDB 


OR 


ON 


2006/01/07 12:03 


L7 


1907 


document with link$3.clm. 


US-PGPUB; . 

USPAT; 

USOCR; 

EPO; JPO; 

DERWENT; 

IBM_TDB 


OR 


ON 


2006/01/07 12:03 


L8 


34 


document near4 pointed.clm. 


US-PGPUB; 

USPAT; 

USOCR; 

EPO; JPO; 

DERWENT; 

IBMJTDB 


OR 


ON 


2006/01/07 12:04 


L9 


331 


document with score.clm. 


US-PGPUB; 

USPAT; 

USOCR; 

EPO; JPO; 

DERWENT; 

IBMJTDB 


OR 


ON 


2006/01/07 12:04 
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L10 


22 


8 and 7 


US-PGPUB; 

USPAT; 

USOCR; 

EPO; JPO; 

DERWENT; 

IBM_TDB 


OR 


ON 


2006/01/07 12:05 


Lll 


6 


8 and 9 


US-PGPUB; 

USPAT; 

USOCR; 

EPO; JPO; 

DERWENT; 

IBM_TDB 


OR 


ON 


2006/01/07 12:06 


L12 


55 


7 and 9 


US-PGPUB; 

USPAT; 

USOCR; 

EPO; JPO; 

DERWENT; 

IBM_TDB 


OR 


ON 


2006/01/07 12:06 


L13 


51 


6 and (10 or 11 or 12) 


US-PGPUB; 

USPAT; 

USOCR; 

EPO; JPO; 

DERWENT; 

IBM_TDB 


OR 


ON 


2006/01/07 12:06 


L14 


44 


13 and (assign$3 or determin$3).clm. 


US-PGPUB; 
USPAT; 
usocr; 
EPO; JPO; 
DERWENT; 
IBM TDB 


OR 


ON 


2006/01/07 12:09 
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Plurals 


Time Stamp 


S34 


4 


S30 and S26 


US-PGPUB; 

USPAT; 

USOCR; 

EPO; JPO; 

DERWENT; 

IBM_TDB 


OR 


ON 


2005/12/05 10:42 


S32 


13 


S30 and S27 


US-PGPUB; 

USPAT; 

USOCR; 

EPO; JPO; 

DERWENT; 

IBM_TDB 


OR 


ON 


2005/12/02 17:08 


S30 


22081 


S28 or S29 


US-PGPUB; 

USPAT; 

USOCR; 

EPO; JPO; 

DERWENT; 

IBM_TDB 


OR 


ON 


2005/12/02 17:07 


S29 


1185 


(715/501.1).CCLS. 


US-PGPUB; 

USPAT; 

USOCR; 

EPO; JPO; 

DERWENT; 

IBM_TDB 


OR 


OFF 


2005/12/02 17:07 


S23 


325 


document with score.clm. 


US-PGPUB; 

USPAT; 

USOCR; 

EPO; JPO; 

DERWENT; 

IBM_TDB 


OR 


ON 


2005/12/02 17:07 


S28 


21138 


(707/1-3,7,10, 100, 104. 1).CCLS. 


US-PGPUB; 

USPAT; 

USOCR; 

EPO; JPO; 

DERWENT; 

IBM_TDB 


OR 


OFF 


2005/12/02 17:06 


S26 


6 


S25 and S23 


US-PGPUB; 

USPAT; 

USOCR; 

EPO; JPO; 

DERWENT; 

IBM_TDB 


OR 


ON 


2005/12/02 17:05 


S27 


22 


S25 and S24 


US-PGPUB; 

USPAT; 

UbOCR; 

EPO; JPO; 

DERWENT; 

IBM_TDB 


OR 


ON 


2005/12/02 16:59 
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S25 


33 


document near4 pointed.clm. 


US-PGPUB; 

USPAT; 

USOCR; 

EPO; JPO; 

DERWENT; 

IBM_TDB 


OR 


ON 


2005/12/02 16:59 


S24 


1856 


document with link$3.clm. 


US-PGPUB; 

USPAT; 

UbULK; 

EPO; JPO; 

DERWENT; 

IBM_TDB 


OR 


ON 


2005/12/02 16:58 
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1 Research session: new applications: The SphereSearch engine for unified ranked 
retrieval of hetero g eneous XML and web documents 
Jens Graupmann, Ralf Schenkel, Gerhard Weikum 

August 2005 Proceedings of the 31st international conference on Very large data 
bases VLDB '05 

Publisher: VLDB Endowment 

Full text available: ^ pdf(381.86 KB ) Additional Information: full citation , abstract , references , index terms 

This paper presents the novel SphereSearch Engine that provides unified ranked retrieval 
on heterogeneous XML and Web data. Its search capabilities include vague structure 
conditions, text content conditions, and relevance ranking based on IR statistics and 
statistically quantified ontological relationships. Web pages in HTML or PDF are 
automatically converted into XML format, with the option of generating semantic tags by 
means of linguistic annotation tools. For Web data the XML-oriented query ... 



Probabilistic combination of content and links 
Rong Jin, Susan Dumais 

September 2001 Proceedings of the 24th annual international ACM SIGIR conference 
on Research and development in information retrieval 

Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings , index 
terms 



Full text available: " gpdfd 67.55 KB) 



Previous research has shown that citations and hypertext links can be usefully combined 
with document content to improve retrieval. Links can be used in many ways, e.g., link 
topology can be used to identify important pages, anchor text can be used to augment the 
text of cited pages, and activation can be spread to linked pages. This paper introduces a 
probabilistic model that integrates content matching and these three uses of link 
information in a single unified framework. Experiments ... 



Learning associative Markov networks 

Ben Taskar, Vassil Chatalbashev, Daphne Koller 

July 2004 Proceedings of the twenty-first international conference on Machine 
learning ICML '04 

Publisher: ACM Press 

Full text available: ^ pdf(202.31 KB) Additional Information: full citation , abstract , references 

Markov networks are extensively used to model complex sequential, spatial, and relational 
interactions in fields as diverse as image processing, natural language analysis, and 
bioinformatics. However, inference and learning in general Markov networks is intractable. 
In this paper, we focus on learning a large subclass of such models (called associative 
Markov networks) that are tractable or closely approximate. This subclass contains 
networks of discrete variables with K labels ea ... 
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Web search 3: Improving web search results using affinity graph 

Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma 

August 2005 Proceedings of the 28th annual international ACM SIGIR conference on 

Research and development in information retrieval SIGIR '05 
Publisher: ACM Press 

Full text available: Q pdf(326.20 KB) Additional Information: full citation , abstract , references , index terms 

In this paper, we propose a novel ranking scheme named Affinity Ranking (AR) to re-rank 
search results by optimizing two metrics: (1) diversity — which indicates the variance of 
topics in a group of documents; (2) information richness — which measures the coverage 
of a single document to its topic. Both of the two metrics are calculated from a directed 
link graph named Affinity Graph (AG). AG models the structure of a group of documents 
based on the asymmetric content similarities between each ... 

Keywords: affinity ranking, diversity, information retrieval, information richness, link 
analysis 



Building efficient and effective metasearch engines 
Weiyi Meng, Clement Yu, King-Lup Liu 

March 2002 ACM Computing Surveys (CSUR), volume 34 issue l 
Publisher: ACM Press 

Full text available- fifl p df(41 6 07 KB) Additional Information: full cita tion, abstract , references, citings, index 
* ^ ' terms 

Frequently a user's information needs are stored in the databases of multiple search 
engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search 
engines and identify useful documents from the returned results. To support unified 
access to multiple search engines, a metasearch engine can be constructed. When a 
metasearch engine receives a query from a user, it invokes the underlying search engines 
to retrieve useful information for the user. Metasearch engines have ... 

Keywords: Collection fusion, distributed collection, distributed information retrieval, 
information resource discovery, metasearch 



6 World Wide Web: Predicting web actions from HTML content 
Brian D. Davison 

June 2002 Proceedings of the thirteenth ACM conference on Hypertext and 
hypermedia 

Publisher: ACM Press 

Full text available: ^ pdf(243.13 KB) Additional Information: full citation , abstract ; references , index terms 

Most proposed Web prefetching techniques make predictions based on the historical 
references to requested objects. In contrast, this paper examines the accuracy of 
predicting a user's next action based on analysis of the content of the pages requested 
recently by the user. Predictions are made using the similarity of a model of the user's 
interest to the text in and around the hypertext anchors of recently requested Web pages. 
This approa22ch can make predictions of actions that have never been ... 

Keywords: WWW, information retrieval, prediction, prefetching, similarity, textual, user 
modeling 




7 Does "authority" mean gualitv? predicting expert quality ratings of Web documents 
|k Brian Amento, Loren Terveen, Will Hill 

v 7 July 2000 Proceedings of the 23rd annual international ACM SIGIR conference on 
Research and development in information retrieval 

Publisher: ACM Press 
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Full text available: Q pdf(773.39 KB ) Additional Information: full citation , abstract , references , citing s, index 

terms 

For many topics, the World Wide Web contains hundreds or thousands of relevant 
documents of widely varying quality. Users face a daunting challenge in identifying a small 
subset of documents worthy of their attention. 

Link analysis algorithms have received much interest recently, in large part for their 
potential to identify high quality items. We report here on an experimental evaluation of 
this potential. 

We evaluated a number of link and content-based algorithms using a dat ... 
Keywords: exploiting hyperlink structure 



Retrieving documents by plausible inference: a priliminary study 
W. B. Croft, T. J. Lucia, P. R. Cohen 

May 1988 Proceedings of the 11th annual international ACM SIGIR conference on 
Research and development in information retrieval 

Publisher: ACM Press 

Full text available* -S) odfd 13 MB) Additional Information: full citation , abstract , referen c e s, citings, index 

1 terms 

Choosing an appropriate document representation and search strategy for document 
retrieval has been largely guided by achieving good average performance instead of 
optimizing the results for each individual query. A model of retrieval based on plausible 
inference gives us a different perspective and suggests that techniques should be found 
for combining multiple sources of evidence (or search strategies) into an overall 
assessment of a documents relevance, rather than attempting to pick a ... 

Web search 2: Entropy-based link analysis for mining web informative structures 
Hung-Yu Kao, Ming-Syan Chen, Shian-Hua Lin, Jan-Ming Ho 

November 2002 Proceedings of the eleventh international conference on Information 
and knowledge management 

Publisher: ACM Press 

Full text available' 11!)j)df(563 64 KB) Additional Information: full citation , abstract , references , citings, index 

terms 

In this paper, we study the problem of mining the informative structure of a news Web 
site which consists of thousands of hyperlinked documents. We define the informative 
structure of a news Web site as a set of index pages (or referred to as TOC, i.e., table of 
contents, pages) and a set of article pages linked by TOC pages through informative links. 
It is noted that the Hyperlink Induced Topics Search (HITS) algorithm has been employed 
to provide a solution to analyzing authorities and hubs of ... 

Keywords: anchor text, entropy, hubs and authorities, information extraction, 
informative structure, link analysis 



10 Web: Query type classification for web document retrieval 
In-Ho Kang, GilChang Kim 

July 2003 Proceedings of the 26th annual international ACM SIGZR conference on 
Research and development in informaion retrieval 

Publisher: ACM Press 

Full text available* fQ pdf(225 50 KB) Additional Information: Mj^tation, abstract , references , citings, index 

terms 

The heterogeneous Web exacerbates IR problems and short user queries make them 
worse. The contents of web documents are not enough to find good answer documents. 
Link information and URL information compensates for the insufficiencies of content 
information. However, static combination of multiple evidences may lower the retrieval 
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performance. We need different strategies to find target documents according to a query 
type. We can classify user queries as three categories, the topic relevance tas ... 

Keywords: URL information, combination of multiple evidences, link information, query 
classification 



11 Automatic text summarization based on the Global Document Annotation 
Katashi Nagao, Koiti Hasida 

August 1998 Proceedings of the 17th international conference on Computational 
linguistics - Volume 2 , Proceedings of the 36th annual meeting on 
Association for Computational Linguistics - Volume 2 

Publisher: Association for Computational Linguistics , Association for Computational Linguistics 
Full text available: ^ pdf(476.15 KB ) Additional Information: full citation , abstract , references , citin gs 

The GDA (Global Document Annotation) project proposes a tag set which allows machines 
to automatically infer the underlying semantic/pragmatic structure of documents. Its 
objectives are to promote development and spread of NLP/AI applications to render GDA- 
tagged documents versatile and intelligent contents, which should motivate WWW (World 
Wide Web) users to tag their documents as part of content authoring. This paper 
discusses automatic text summarization based on GDA. Its main features are a ... 



12 Information retrieval session 7: web: Representin g interests as a hyperlinked 
^ document collection 
^ Michelle Fisher, Richard Everson 

November 2003 Proceedings of the twelfth international conference on Information 
and knowledge management 

Publisher: ACM Press 

Full text available: ^pdf(111.85 KB ) Additional Information: full citation , abstract , references , index terms 

We describe a latent variable model for representing a user's interests as a hyperlinked 
document collection. By collecting hyper-text documents that a user views, creates or 
updates whilst at their computer, we are able to use not only the content of these 
documents but also the inter-connectivity of the collection to model the user's interests. 
The model uses Probabilistic Latent Semantic Analysis and Probabilistic Hypertext Induced 
Topic Selection and decomposes the user's document collection ... 

Keywords: hyperlinked/hypertext document collections, information access, latent 
variable models, user interests 




13 Topic-based browsing within a digital library using keyphrases 
Steve Jones, Gordon Paynter 

August 1999 Proceedings of the fourth ACM conference on Digital libraries 
Publisher: ACM Press 

Full text available: ^ pdf(266.18 KB ) Additional Information: full citation , references , citings, index terms 




Keywords: automated hypertext generation, information exploration, information 
retrieval, keyphrase extraction 



14 Natural language processing for information retrieval 
David D. Lewis, Karen Sparck Jones 

January 1996 Communications of the ACM, Volume 39 issue i 
Publisher: ACM Press 

Full text available: pdf(602.45 KB) Additional Information: full citation , references , citin gs, index terms 
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15 Enhanced hypertext categorization usin g hyperlinks 
Soumen Chakrabarti, Byron Dom, Piotr Indyk 

June 1998 ACM SIGMOD Record , Proceedings of the 1998 ACM SIGMOD international 

conference on Management of data SIGMOD "98, volume 27 issue 2 
Publisher: ACM Press 

Full text available* fijt| pdf(1 91 MB) Additional Information: full citation , abstract , references , citings , index 
• |y w • terms 

A major challenge in indexing unstructured hypertext databases is to automatically 
extract meta-data that enables structured search using topic taxonomies, circumvents 
keyword ambiguity, and improves the quality of search and profile-based routing and 
filtering. Therefore, an accurate classifier is an essential component of a hypertext 
database. Hyperlinks pose new problems not addressed in the extensive text classification 
literature. Links clearly contain high-quality semantic clues that ... 

16 Modeling and combinin g evidence provided by document relationships us ing 
^ probabilistic argumentation systems 

^ Justin Picard 

August 1998 Proceedings of the 21st annual international ACM SIGIR conference on 
Research and development in information retrieval 

Publisher: ACM Press 

Full text available: fiE|pdf(1.04 MB ) Additional Information: full citation , references , citings, index terms 



17 Image retrieval by hypertext links 

V. Harmandas, M. Sanderson, M. D. Dunlop 

July 1997 ACM SIGIR Forum , Proceedings of the 20th annual international ACM 

SIGIR conference on Research and development in information retrieval 
SIGIR '97, Volume 31 Issue SI 
Publisher: ACM Press 

Full text available: ^pdf(1.09 MB) Additional Information: full citation , references , citin gs, index terms 




1 8 Multimedia and visualization: Dynamic structuring of web in formation for access 
visualization 

Jess Y. S. Mak, Hong Va Leong, Alvin T. S. Chan 

March 2002 Proceedings of the 2002 ACM symposium on Applied computing 
Publisher: ACM Press 

Full text available: ^ pdf (765 >2 3 KB) Additional Information: full citation , abstrac t, references, index terms 

The Internet has led to the formation of a global information infrastructure. To explore a 
web site, a site map would be useful as a short cut for a user to locate for the target 
information in a structured and efficient manner, rather than drilling into the web site 
following hyperlinks, reading possibly irrelevant information. Useless information impacts 
a mobile web environment, where mobile clients are only connected with unreliable 
wireless channels of limited bandwidth. Structured web page ... 

/' 

Keywords: DOM, VRML, XML, visualization, web document structure 




19 Phrasier: a system for interactive document retrieval using keyphrases 
Steve Jones, Mark S. Staveley 

August 1999 Proceedings of the 22nd annual international ACM SIGIR conference on 
Research and development in information retrieval 

Publisher: ACM Press 

Full text available: ^ pdf ( 625.73 KB) Additional Information: full citation , references , citings, index terms 
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Keywords: evaluation, interactive retrieval interface, keyphrase-based retrieval, query 
interface 



20 Session IX - coordination and decision making: Synview: the design of a system for Q 
cooperative structuring of information 
David G. Lowe 

December 1986 Proceedings of the 1986 ACM conference on Computer-supported 
cooperative work 

Publisher: ACM Press 

Full text available: ^ pdf(700.45 KB) Additional Information: full citation , abstract , references , citin gs 

The SYNVIEW system implements cooperative structuring of information through an 
explicit representation for debate between the users of the system and through a voting 
mechanism for resolving disputes. This paper reviews the original design of the system 
and describes modifications that are necessary for near-term applications. In particular, 
we examine ways to interface to existing information in the form of traditional documents, 
and we describe simplifications to the debate representation tha ... 
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21 Topical loc alit y in the Web 
Brian D. Davison 

July 2000 Proceedings of the 23rd annual international ACM SIGZR conference on 
Research and development in information retrieval 

Publisher: ACM Press 

Full text available- 1j| pdf(771 77 KB) Additional Information: full citation , abstract, references , citings, index 

. terms 

Most web pages are linked to others with related content. This idea, combined with 
another that says that text in, and possibly around, HTML anchors describe the pages to 
which they point, is the foundation for a usable World-Wide Web. In this paper, we 
examine to what extent these ideas hold by empirically testing whether topical locality 
mirrors spatial locality of pages on the Web. In particular, we find that the likelihood of 
linked pages having similar textual content to be ... 



22 KM-3 (knowledge management): knowledge extraction: Node ranking in labeled 
directed grap hs 

Krishna P. Chitrapura, Srinivas R. Kashyap 

November 2004 Proceedings of the thirteenth ACM international conference on 
Information and knowledge management CIKM '04 

Publisher: ACM Press 

Full text available: ^ pdf(447.39 KB) Additional Information: full citation , abstract , references , index terms 

Our work is motivated by the problem of ranking hyper-linked documents for a given 
query. Given an arbitrary directed graph with edge and node labels, we present a new 
flow-based model and an efficient method to dynamically rank the nodes of this graph 
with respect to any of the original labels. Ranking documents for a given query in a 
hyper-linked document set and ranking of authors/articles for a given topic in a citation 
database are some typical applications of our method. We outline the ... 

Keywords: citation graph, context-sensitive ranking, flow-based, intranet search, link 
structure, model, pagerank, random surfer model, search, search in context, web graph 



H 




23 Information access and retrieval: Multiple related document summary and navi g ation Q 
usin g concept hierarchies for mobile clients 
D. L. Chan, R. W. P. Luk, W. K. Mak, H. V. Leong, E. K. S. Ho, Q. Lu 
March 2002 Proceedings of the 2002 ACM symposium on Applied computing 
Publisher: ACM Press 

Full text available: ^ pdf(660.36 KB) Additional Information: full citation , abstract , references , index terms 
Mobile clients have limited display and navigation capabilities. To browse a set of 
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documents, an intuitive method is to navigate through concept hierarchies. To reduce 
semantic loading for each term that represents the concepts and the cognitive loading of 
users due to the limited display, similar documents are grouped together before concept 
hierarchies are constructed for each document group. Since the concept hierarchies only 
represent the salient concepts in the documents, term extraction i ... 

Keywords: browsing, concept hierarchy, information access, mobile agent, mobile 
computing, navigation, summarization 



24 Posters: Web page summarization usin g dynamic content 
Adam Jatowt 

May 2004 Proceedings of the 13th international World Wide Web conference on 
Alternate track papers & posters 

Publisher: ACM Press 

Full text available: ^ pdf(151.56 KB) Additional Information: full citation , abstract , references , index terms 

Summarizing web pages have recently gained much attention from researchers. Until now 
two main types of approaches have been proposed for this task: content- and context- 
based methods. Both of them assume fixed content and characteristics of web documents 
without considering their dynamic nature. However the volatility of information published 
on the Internet argue for the implementation of more time-aware techniques. This paper 
proposes a new approach towards automatic web page description, whi ... 

Keywords: change detection, web document, web page summarization 
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25 Components of GIR: Indexing and ranking in Geo-IR systems 
Bruno Martins, Mario J. Silva, Leonardo Andrade 

November 2005 Proceedings of the 2005 workshop on Geographic information 
retrieval GIR '05 

Publisher: ACM Press 

Full text available: ^pdf(143.13 KB) Additional Information: full citation , abstract , references , index terms 

This paper addresses document indexing and retrieval using geographical location. It 
discusses possible indexing structures and result ranking algorithms, surveying known 
approaches and showing how they can be combined to build an effective Geo-IR system. 

Keywords: Geo-IR, indexing, ranking, searching 



26 Effective site finding using link anchor information 
Nick Craswell, David Hawking, Stephen Robertson 

September 2001 Proceedings of the 24th annual international ACM SIGIR conference 
on Research and development in information retrieval 

Publisher: ACM Press 

Full text available' fjfl pdf(145 23 KB) Additional Information: full citation, abstract , references, citings, index 
'^^—^ ! terms 

Link-based ranking methods have been described in the literature and applied in 
commercial Web search engines. However, according to recent TREC experiments, they 
are po better than traditional content-based methods. We conduct a different type of 
experiment, in which the task is to find the main entry point of a specific Web site. In our 
experiments, ranking based on link anchor text is twice as effective as ranking based on 
document content, even though both methods used the same BM25 ... 

27 QCS: a tool for querying, clustering , and summarizin g documents 
Daniel M. Dunlavy, John Conroy, Dianne P. O'Leary 

May 2003 Proceedings of the 2003 Conference of the North American Chapter of the 
Association for Computational Linguistics on Human Language 
Technology: Demonstrations - Volume 4 NAACL '03 
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Publisher: Association for Computational Linguistics 

Full text available: ^ pdf(1 94.71 KB) Additional Information: full citation , abstract , references 

The QCS information retrieval (IR) system is presented as a tool for querying, clustering, 
and summarizing document sets. QCS has been developed as a modular development 
framework, and thus facilitates the inclusion of new technologies targeting these three IR 
tasks. Details of the system architecture, the QCS interface, and preliminary results are 
presented. 

28 Link-based and content-based evidential information in a belief network model 
Jtfr Ilmerio Silva, Berthier Ribeiro-Neto, Pavel Calado, Edleno Moura, Nivio Ziviani 
v July 2000 Proceedings of the 23rd annual international ACM SIGIR conference on 
Research and development in information retrieval 
Publisher: ACM Press 

Full text available* pdf(854 30 KB) Ac,cf itional Information: full citation , abstract , references , citings , index 

1 terms 

This work presents an information retrieval model developed to deal with hyperlinked 
environments. The model is based on belief networks and provides a framework for 
combining information extracted from the content of the documents with information 
derived from cross-references among the documents. The information extracted from the 
content of the documents is based on statistics regarding the keywords in the collection 
and is one of the basis for traditional information retrieval (IR) rankin ... 
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This paper shows how citation-based information and structural content (e.g., title, 
abstract) can be combined to improve classification of text documents into predefined 
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categories. We evaluate different measures of similarity — five derived from the citation 
information of the collection, and three derived from the structural content — and 
determine how they can be fused to improve classification effectiveness. To discover the 
best fusion framework, we apply Genetic Programming (GP) techniqu ... 
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The massive distribution of the crawling task can lead to inefficient exploration of the 
same portion of the Web. We propose a technique to guide crawlers exploration based on 
the notion of Web communities. Thest ability properties of the method can be used as an 
implicit coordination mechanism to increase the efficiency of the crawling task. 
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Creation and maintenance of links in large hypermedia documents is difficult. Motivated 
by an application to a federal clinical practice guideline for cancer pain management, we 
have developed and evaluated a repertory grid-based linking scheme we call repertory 
hypergrids. Harnessing established knowledge acquisition techniques, the repertory 
hypergrid assigns each "knowledge chunk" a location in "context space". A chunk links to 
another chunk if th ... 
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High-dimensional collections of 0—1 data occur in many applications. The attributes in 
such data sets are typically considered to be unordered. However, in many cases there is 
a natural total or partial order &pr; underlying the variables of the data set. Examples of 
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variables for which such orders exist include terms in documents, courses in enrollment - 
data, and paleontological sites in fossil data collections. The observations in such 
applications are flat, unordered sets; however, the data s ... 
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The primary goal of this PhD thesis is to align printable documents with meetings' dialogs. 
This bi-modal alignment consists in bridging thematic links between documents 1 content 
and speech transcripts' content. An obvious application is a system that automatically link 
document parts with audio-video extracts of a meeting. Further, this bi-modal alignment 
is considered for thematically segmenting both meeting dialogs and documents discussed 
during this meeting. 
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The structure of the web is increasingly being used to improve organization, search, and 
analysis of information on the web. For example, Google uses the text in citing documents 
(documents that link to the target document) for search. We analyze the relative utility of 
document text, and the text in citing documents near the citation, for classification and 
description. Results show that the text in citing documents, when available, often has 
greater discriminative and descriptive power than th ... 
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We propose new features and algorithms for automating Web-page classification tasks 
such as content recommendation and ad blocking. We show that the automated 
classification of Web pages can be much improved if, instead of looking at their textual 
content, we consider each links's URL and the visual placement of those links on a 
referring page. These features are unusual: rather than being scalar measurements like 
word counts they are tree structured— describing the position of the item ... 
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